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Abstract — Many common probability distributions in statistics 
like the Gaussian, multinomial, Beta or Gamma distributions can 
be studied under the unified framework of exponential families. 
In this paper, we prove that both Renyi and Tsallis divergences 
of distributions belonging to the same exponential family admit a 
generic closed form expression. Furthermore, we show that Renyi 
and Tsallis entropies can also be calculated in closed-form for 
sub-families including the Gaussian or exponential distributions, 
among others. 

Index Terms — Shannon entropy, Renyi entropies ; Tsallis 
entropies ; divergences; exponential families. 

I. Introduction 

In 1948, Shannon published a theory on communications 
that initiated the field of information theory [1]. Nowadays, it 
is well-known that Shannon entropy quantitatively measures 
the amount of uncertainty [2] of a random variable. The 
entropy H{P) of a random variable P is defined according to 
its underlying density p{x) as 



H{P) 



p(x) log 



1 



p{x) 



dx 



(1) 



p{x) \ogp{x)&x = Ep[- \ogp{x)]. (2) 



Closed-form expressions for the Shannon entropy for many 
continuous distributions are reported in [3]. In coding theory, 
one seeks for codes that uses the underlying structure of the 
message language at its best. Since in practice, the true model 
distribution P is hidden by nature and therefore unknown to 
the observer, we rather define the cross-entropy between the 
considered model Q and the unknown ideal random variables 
P as 

H^P:Q) = Ep[-logq{x)]=- J pix)logqix)d:i^3) 
> i?^(P;P). (4) 

It follows that the Kullback-Leibler divergence (also called 
relative entropy) between two distributions P and Q is defined 
by 



KL(P : Q) 



p{x) log ^-\±dx = Ep 
q(x) 



log 



q[x) 



(5) 
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The Kullback-Leibler divergence KL(P : Q) is an oriented 
distance (i.e., KL(P : Q) ^ KL(Q : P), emphasized by the 
";" notational convention) that can be rewritten as 



KL(P : Q) = (P : Q) ^ H{P) > 



(6) 



In 1961, Renyi generalized the Shannon entropy by modify- 
ing one of its axiom characterizing the averaging of informa- 
tion. The Renyi Ha {p) entropy [4] of a probability distribution 
p is a single-parametric function defined by 



log / p°'{x)dji 
l-a 



,ae (0,+oo)\{l}. 



(7) 



Let us prove using L'Hopital rule' that Renyi entropies 
tend to Shannon entropy H{p) ~ — J p{x)\ogp{x)dx when 
a — > 1. (This is a classical proof explained in textbooks that 
we include here to illustrate L'Hopital rule that we shall use 
repeatedly later on.) 

Proof: Consider the discrete case (i.e., counting measure) 
of Renyi and Shannon entropies. Set f{a) = logX]r=i P? ^^'^^ 
any fixed distribution P) and g{a) = 1 — a. Then ^^^^ = — 1 
and 



d/(a)_Er=i35(Pf) 



da 'Ei=iP? 
after applying the derivative chain rule. Since 

^iP?) = ;^e"'°s^- = (logp,;)e"'°^^' = 
do: da 

we get 



— ^ = - > logK, and Inn — — 



(8) 



P?logK, (9) 



^p^l0gp^. 

. - i=l 

Since limQ^i /(a) = limQ^i g{a) = and lima_^i = 
— ^"^j^ Pi logpi, we deduce from I'Hopital rule that 
linia-^i H^{P) = H{P). That is, Renyi entropy tends to 
Shannon entropy as a — > 1. ■ 
Renyi entropies keep Shannon additivity property [2] of 
independent systems, and are concave and monotonically 
decreasing function of a. Closed-form formula for the Renyi 
entropies of many multivariate distributions are reported in [5], 
and for the multivariate Gaussian distribution in the technical 
report [6]. 

'L'Hopital rule dates back to the 17th century, and states that the hmit of 
the indeterminate ratio of functions equals to the limit of the ratio of their 
derivatives provided that (i) the limits of both the numerator and denominator 
coincide, and that (ii) the limit of the ratio of the derivatives also exists. That 
is, if lua^^a f{x) = Wnix^a g{x) = and liaix^a f (x) / g' (x) = I 



exists, then liiiia; 
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In 1988, Tsallis (motivated by physical multi-fractal sys- 
tems) introduced yet another one-parameter generalization of 
Shannon entropy. Historically, this family of entropic functions 
was derived axiomatically by Havrda and Charvat [7] in 1967. 
Tsallis H^{p) entropies of a probability distribution p are 
defined by 



J p{x)°'dx - 1 



1 



,a e M\{1} 



(11) 



TsalUs entropies are non-additive, tending to Shannon entropy 
when a 1, and can be derived from the generalized 
Shannon-Khinchin axioms [8]. 

Let Ia{p) = J p{x)°'dx, then Renyi and Tsallis entropies 
can be rewritten as 



Hlip) = 



/a(p)-l 

l-a ' 



(12) 
(13) 



Since = (1 - a)H^{p) + 1 = e^^^")^" , we can 

convert these two families of entropies through the following 
monotonic conversion functions: 



counting or Lebesgue). An exponential family can be uni- 
variate (eg., like the Poisson or ID Gaussian distributions) or 
multivariate (like the multinomial or d-dimensional Gaussian 
distributions). The order of an exponential family denotes the 
dimension of the parameter space. Thus the Gaussian (normal) 
distribution is univariate of order 2 (parameters fj, and cr). 

Many common distribution families such as Poisson, Gaus- 
sian or multinomial distributions are exponential families 
whose canonical decompositions {F, t, 6, k) are given in [10]. 

Let us prove that for any distribution belonging to the expo- 
nential families, we have the following entropy expressions: 



J¥f(pF(x;0)) 
1 



F{a9) - aF{0) + log Ep[e 



(a-l)fc(a:)- 



(17) 



= _l_((e^("e)-"^'W)iJp[e("-i)^(-)]-l) (18) 
Proof: 

Consider calculating Ia{p) = J p{x)°'dx term for exponen- 
tial families: 



H^ip) = 
Hlip) = 



log((l - a)Hl{p) + 1) 
1 - a 



1 - a 



(14) 
(15) 



n. Renyi and Tsallis entropies of exponential 

FAMILIES 

A random variable X ^ Ep{9) is said to belong to 
the exponential family Ep [9] when it admits the following 
canonical decomposition of its density: 



Pf{x] 9) = exp ((t(a;), 9) - F[9) + k(x)) 



(16) 



where (a;, y) — x^y denotes the inner product, t(x) the 
sufficient statistics, 9 the natural parameters, F(0) a C°° 
differentiable real-valued convex function, and k{x) a carrier 
measure. 

Since F{9) = log exp{{t{x) , 9) + k{x))dx (because 
J PF{x;9)dx — 1), function F is called the log-normalizer 
Function F characterizes the family, while the natural param- 
eter 9 denotes the member of the family Ep- 

A statistic is a function of the observations (say, the sample 
mean or sample variance) that collects information about the 
distribution with the goal to concentrate information for later 
inference. A statistic is said sufficient if it allows one to 
concentrate information obtained from random observations 
without loosing information, in a sense that working directly 
on the observation sets or its compact sufficient statistics 
yields exactly the same parameter estimation results. It can 
be shown from the Neyman-Pearson factorization theorem [2], 
under mild regularity conditions, that the class of distributions 
admitting sufficient statistics are precisely the exponential 
families [10]. Term k{x) is related to the carrier measure (i.e.. 



/„(p)=y e"(<*(^)^''>-^('')+'=(^»dx (19) 

^{t{x),ae)-aF{e)+ak{x) + (l-a)k(x)-il-a)kix)+Fiae)-Fiae^)2j^ 

gF(ae)-af (e)p^(^. c,0)e("-i)M-)da; (21) 
= ePi'-<')-'-FW yp^(a;;a0)e("-i)M-)d^ (22) 
= e^{'-e)^'-Fie)E^[^{^-m-)^ (23) 

The formula for the Renyi and Tsallis entropies are then 
derived using Eq. 12 and Eq. 13. 

■ 

In particular, for standard carrier measure k{x) = (eg., 
Gaussian, exponential, Bernoulli or centered Laplacian), we 
obtain the following generic closed-form expressions of Renyi 
and Tsallis entropies: 



h!^{pf{x;0)) 

Hl{pp{x-9)) 



1 



1 - a 
1 

l-a 



{F{a9) ~ aF{9)) (24) 

F(aS)-aF(S) _ A ^25) 



For a — > 1, observe that both those formula yields to 
Shannon entropy for exponential families (with k{x) = 0): 

H{pf{x- 9)) = F{9) - (9, \7F{9)) (26) 
Proof: Let us use L'Hopital rule on Renyi entropy 



H^{pf{x-9)) 



1 



{F{a9) - aF{9)) (27) 



{9,VF{a9)) - F{9) 
^1 



(28) 



F{9)-{9,VF{9)) (29) 
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(using Gateaux derivatives VaF{a6) = {9,S/F{a6))) ■ 

For non-zero carrier measure the Shannon entropy of an 
exponential family p ^ Ef{0) is H{pp{x;9)) = F{6) — 
{9,VF{9)) - Ep[k{x)]. This will be proved in section III. 

Example To illustrate the generic entropy formula, let us start 
with a simple exponential family: the exponential distribution. 
The exponential distribution models the time between two 
successive Poisson processes, and has density 



where A > is called the rate parameter. 



(30) 



Writing Ae 



- Aa: 



- Ax+log A 



\ we get the canonical de- 



composition of exponential families with t{x) = x, 9 ~ 
-A, F{9) = -log A = -log -6* and k{x) = 0. The 
exponential distribution is a univariate exponential family of 
order 1. The Renyi entropy is H^{p) = Yh^{F{a9) — 
aF{9)) - ^(-logaA + alogA) = \ogX - The 
log-normalizer derivative is F'{9) = — i = j. The Shannon 
entropy is H{p) = F{9) - 9F'{9) = 1 - log A. Using 
L'Hopital rule, we find that Xwa^i H^{p) = —log A — 
lima^i^ = 1 - log A = H{P). Tsallis entropy is 
— 1) = -^lE^x- Again, using L'Hopital rule, we find 
that Tsallis entropy converges to Shannon entropy as a — > 1; 
ifJb)lim„^i = = 1 - log A = H{p) 

(where the derivatives are computed according to parameter 
a). 

Example Let us consider now the usual Gaussian distribution 
(univariate of order 2) with density 



p{x;n,a) 



1 



(31) 



Its canonical decomposition into an exponential family 
yields 



p{x\n,a) 



(32) 



exp,-^+.ii-iJ-ilog2.a^) (33) 



exp ( ( 



2^ { f_ L 

(72 ' 2(72 



F{9) (34) 



• Sufficient statistics: t{x) = {x,x'^), 

• Natural parameters 9 = ~2a^), 

, Log-normalizer F{9) = -^ + ilog^ = + 
i log 27ra^, and 

• Carrier measure k{x) = 0. 

Thus the Renyi entropy of Eq. 24 instanced to the Gaussian 



1 



1 - a 
1 



{F{a9) - aF{9)) 

2 1 . 27ra2 



1 



a \"2(72 



2 log 



a 



2a2 



-log27r(72(y) 



1 



a 



■ log 27r<7 



■ log 27r(7 
log a 



iloga 



2'"° 2(1 -a) 

> 1, H^{p) ilog27re(72 (using L'Hopital 
2(^a-i) ~^)- Now using the Shannon closed form 

entropy of Eq. 26 with VF(0) = (-^, ^- ^i^) = ^2^ 
0-2) , we again find the Gaussian entropy H{p) = i log 27recr2; 



When a 
rule 



H{p) = F{9)- {9,S/F{9)) 

2 1 

A' , 1 2 



(39) 



2(72 



log 27r(7 — 



, {fl, + (7^ 



' log27r(7 ' 



2(72 2 

1 , 
- log 2iTe(T 

2 ^ 



2(72 



(41) 
(42) 



It follows from the conversion formula of Eq. 15 that the 
Tsallis entropy of the Gaussian is 



,(l-Q)log(27i-eo-^)2 _ -j^ ||27re(72)' 



(43) 



1 — a 1 — a 

Again, we check that when a ^ 1, the Tsallis entropy tends 
to Shannon entropy: 



^(l-a) log(27re(T^) 2 _ 

1-a 



^^^^ l + (l-a)log(2W)2 
1 — 

1 



■ log 2nea 



(45) 




Similarly but with greater matrix calculus complexity, based 
on the canonical decomposition reported in [10], we may 
consider the multivariate Gaussian distribution X ~ JV{fi, S) 
with mean fi and covariance matrix S (dct S ~ |E| > 0). 
Appendix A provides the calculus details. We report the result 
here. The Renyi a-entropy is given by 



7?f(X) = ^log27r+ilog|S 



dloga 
2(a-l) 



(46) 



and tend to Shannon entropy as a 1 (using L'Hopital rule 

d log a ^ d\. 
2(a-l) — a^l 2>- 



1 



//(X)--log(2^e)''|S| 



(47) 



Note that the sufficient statistics t{x) does not intervene 
in the entropy formula. The sufficient statistics plays a role 
for estimating parameter 9 from independent and identically 
distributed (i.i.d.) observations, as mentioned in the concluding 
remarks (see Section V). 
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III. Renyi and Tsallis divergences of exponential 

FAMILIES 

Consider now two probability distributions P and Q, and 
define the Renyi D^{p : q) and Tsallis D'^{p : q) divergences 
as follows 



log / p{x)"q{xy "dx 



a ~ 1 



(48) 



a- 1 

Those divergences are related to the a-divergence^ 

Up:q)^ I p{xrq{xf-^Ax 



(50) 



that plays an important role^ in information geometry [12]: 



log/a(p : q) 

a ~ \ 
Igjp : g) - 1 
a- 1 

Renyi divergence can also be rewritten as 



DSip--q) = 
Dl{p--q) = 



(51) 
(52) 



log/p(x)"g(a;)i-"da; _ log / (fM) '^('^^'^^ 



a- 1 



a — 1 



D^ip ■ q) 

( V I < V — I 

(53) 

that shows it is a Csiszar /-divergence [13]. The special case 
a = i yields 

Dfip : q) = -21og /" VR^Vqi^dx = -2logB{p,q), 

(54) 

where B{p, q) ^ J y/p{x)^yq{x)dx is called the Bhattachar- 
rya coefficient [14]. The Bhattacharrya coefficient is itself 
related to the (squared) Hellinger distance [15]: 



1 



(55) 



i (^J p{x)dx + J q{x)dx -2 J (Vp(^Vg(a0d(i5^) 



l-B{p,q). 



(57) 



For members of the same exponential families (we do not 
require standard carrier measure k{x) to be zero anymore), the 
Renyi and Tsallis divergences [16] can always be calculated 
from the following closed-form solution: 



D^{pp{x;e):pF{x;e')) = 



1 



1 - a 



JfA^ : 6') (58) 



Dl{ipFix-e):pFix-e')) = 
where 



Jf,, 



JfA^ ■ = aF{0) + {l~a)F{e')-F{ae+{l-a)e') (60) 

^Historically, tliis divergence was first presented by Chemoff in [11]. 
^Namely, the role of canonical divergence in constant curvature statistical 
manifolds. 



is the skew divergence based on the Jensen gap obtained 
from the log-normalizer convex function F. JF,a{0 ■ S') 
is non-negative for a e [0, 1] and non-positive for a G 
(— cx), 0] U [1, oo). It looses discriminatory power (i.e., Jf{S ■ 
0') = 0, ye,0') for a G {0,1}. 

Proof: Let us consider computing Ia{p : q) = la{0 : 
9') for members p ^ Ef{Q) and q ^ Ef{6') of the same 
exponential family Ef'- 



(61) 



Up-.q) = pixTqixf-'^dx 



a{(t{x),e)~F{e)+k(x)) 



X exp(i-")(<*(^)'«'>-^(^')+'^-(^)) dx (63) 
X exp ~aF{e) - (1 - a)F{e') + k{x)dx (65) 

F{ae+(l-a)e')~aF(9) ^gg-j 

X exp -(1 - a)F{e')pF{x; a9 + {1 - a)e'(^ 
^-Jf-MS-s') f ae + {l~ a)e')dx (68) 



(69) 



Thus the Renyi divergence of members of the same ex- 
ponential family amounts to compute a scaled skew Jensen 
divergence for the log-normalizer: 



D^ip ■■ q) 



1 - a 



(70) 



Note that for a > 1, we have both 1 — a < and JfA^ ■ 
9') < so that Renyi divergence is non-negative. (However, 
for a<0, 1 — a>0 but JF.a{e:0') < 0. This shows that 
Renyi divergences are defined for a £ (0, c>d)\{1}.) 

The formula for Tsallis divergence follows from the follow- 
ing conversion formula 

(a-l)D«(p:g) _ i Jf,„ (6:0') _ 1 

Dl{p:q)^- — - = - (71) 



a-1 



a- 1 



Observe that Dl{p : q) D^{p : q) -^^i KL{p : q) 

(using the argument —x->-o 1 + x). When a 1, we get 
the well-known result [17], [18] that 

KL{pf{x; 0) : pf{x; 9')) = Bf{9' : 9), (72) 
where Bf is, the Bregman divergence [19] defined by 

Bf{p : q) = F{p) - F{q) - {p - q, VF{q)). (73) 

Proof: Consider the limit case of Renyi divergence of 
members of the same exponential family as a — ?> 1. We shall 
use the following Taylor expansion (Gateaux derivative) of a 
skew Jensen divergence: 
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JF,a( 



F{ae + (1 - a)e') (74) 
-aF{e) - (1 - a)F(e') (75) 
(1 - a)F{9') + (1 - a){e' - 0, VF(?^) 
-{l-a)F{d) (77) 



lim D^{p : q) 

a— >-l 



lim '^^■"^'^^^'^ =KL(p:g)(78) 



Hp{e) = -J pf{x- e) logpf (z; e)Ax (85) 
^(6*)- I pF{x;e){t{x),e)dx- I k{x)pF{x;e)dx{S6) 



= F{9) - i^j ppix; e)t{x)dx, O^dx- j k{x)pF{x] 
= Hf{0) = F{9) - {e,VF{e)) +b (88) 
That is, the constant is given by = — J k{x)pF{x; 9)dx = 



o-s-i 1 — a 

>i F{9') - F{9) - {9' - 9,VF{9yp) -Eg[k{x)]. (It depends on the member 9 of the family for 

kix) ^ 0.) 



BFi9' : 9) 



(80) 



A direct alternative proof is also given in Appendix A. 

Example Consider the exponential distribution. We recall 
that the natural parameter is 9 = —A and the log-normalizer 

F{9) = — log— ~ — log A. For two members p ~ Ef{9) 
and q ^ Ef{9') of the same family of exponential dis- 
tributions, we have the Renyi divergence D^{p : q) = 
j^{aF{9) + (1 - a)F{9') - F{a9 + (1 - a)9')) = 

log ■ The Tsallis divergence is : q) = 

1 I A°A'^-° IN A 

l-Q^ctA+l'l-aU' '^l- V 



i\+{l-a)X' 

IV. Shannon entropy and cross-entropy for the 

EXPONENTIAL FAMILIES 

Let us now prove that the Shannon entropy and cross- 
entropy of distributions belonging to the same exponential 
family can be expressed as 

H{p) = F{9)-{9,VF{9))-Ee[k{x)] (81) 
H{p:q) = F{9')- {9',WF{9)) -Eg[k{x)] (82) 

Proof: Write the relative entropy as the difference of the 
cross-entropy minus the entropy: 



KUp:q)^H''{p:q)-Hip). 



(83) 



For distributions belonging to the same exponential families, 
we can separate the terms independent of q (i.e., 9') from the 
terms depending on p (i.e., 9), to get 



KL{p:q) = 



Bf{9' : 9) 

F{9') - F{9) - (9' - 9, VF(6i)) 

F{9') - {9',WF{9)) - {F{9) - {9,WF{9))) 



-'Hi 



Since the Bregman convex generator F is defined up to an 
affine term ax + 6 in the Bregman divergence, and since the 
factor a leaves independent both the entropy and cross-entropy 
terms, we deduce that 



H{p) = Hf{9) - F{9) - {9, VF{9)) + b, 



(84) 



where 6 is a constant. To determine explicitly the entropic 
normalization additive constant b, we proceed as follows: 



Example Consider the Poisson distribution with probability 
mass function p{x; \) = "Ti^^'^"' • canonical decom- 
position yields 9 = log A, F{9) = exp9 = A (derivative 
is F'{9) = cxp9 — A), t{x) — x and k{x) ~ —log a;!. 
The Poisson entropy is therefore F(9) — 9F'{9) + b = 
A(l — log A) — E[k{x)]. Since k{x) = -logs!, we have 



b=-E[k{x)]=Y:^^^PF{x-\)\ogk\ 



E 



a'' logfc! 
k\ 







V. Summary, conclusion and discussion 

In this paper, we have given closed-form expressions for the 
Renyi and Tsallis divergences of distributions p ^ Ef{9) and 
q ^ Ef{9') belonging to the same exponential family Ef'- 



Dlip : q) 
KL(p : q) 



JF,a{0 : 9') 



1 — a 



?')- 1 



a-1 



lim D^ip : q) 

lim Dlip : q) = Bf{ 
a— >-l 



(89) 

(90) 
(91) 
(92) 



where 



9') = aF{9) + (1 - a)F{9') - F{a9 + (1 - 

= JF.i-a{0' ■■ 0) (94) 



is the skew Jensen divergence. Since the Renyi divergence for 
a = i is related to the Bhattacharrya coefficient and Hellinger 
distance, this also yields closed-form expressions for members 
of the same exponential family: 



Bip,q) 
H{p,q) 



-Jp 1(9,9') 

e '5 



1 



(95) 
(96) 



Furthermore, we showed that the Renyi and Tsallis en- 
tropies, including Shannon entropy in the limit case, can be 
expressed respectively as 

H^{pf{x-9)) = ^ (F{a9) ^ aF {9) + log Ej,[e'-''-^'^^H^)p 
1 



Hl{pF{x;9)) = -— {{e^^-o)-'^no))E,[e 
I — a \ 

H{pf{x;9)) ^ F{9) - {9,\/F{9)) - Ep[k{x)] 



{a-l)k{x) 



(98) 
(99) 
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The Shannon cross-entropy is given by 

^ ipF{x; 9) : pFix; 9')) = F{9') - {9', VF{9)) - Ep[k{x)] 

(100) 

Thus these entropies admit closed-form formula whenever 
the normalizing carrier measure is zero (k{x) = 0): 



H^{pf{x;9)) = 



I — a 



{F{a9) - aF{9)) (101) 



1 — a V / 
H{pf{x;9)) = F{9)- {9,WF{9)) (103) 

This includes the case of Bernoulli, exponential, Gaussian 
and center Laplacian distributions, among others. (We report 
in A the Renyi entropy for multivariate Gaussian distributions 
using matrix calculus.) 

Recently, Poczos and Schneider [20] have proposed a tech- 
nique to estimate the a-divergence based on the fc-nearest 
neighbor graph. Although applicable to any kind of distribu- 
tions, their method is computationally intensive and hmited 
in practice to small dimensions. In contrast, we may estimate 
the Renyi entropy and divergence of distributions belonging 
to the same exponential family by applying the closed-form 
expressions on the estimates of parameter distributions. This 
is all the more efficient as the maximum likelihood estimator 
(MLE) of exponential families for independent and identically 
distributed (i.i.d.) observations xi,...,Xn is also available in 
closed-form; 



(104) 



This estimate 6 is termed the observed point in information 
geometry [12]. 
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Appendix 

Renyi and Tsallis entropies and divergences for multivariate 
Gaussians 

The probability density of a multivariate Gaussian centered 
at p with covariance matrix S is given by 



p{x]fi,T.) 



(2^)V|S 



1 {x - p,)'^j:~^{x - ^i) 

cxp 



(105) 



Let us rewrite this density to fit the canonical decomposition 
of exponential famihes: 



p{x;n,^) = exp[-^x'^T.-^x + xfi^j:-^ -^p^J:-^(A06) 



exp (^^(.T,.T^x), (^S-V,-^S-i^^ -i^(H)^ 



with 6 = (Sl-V,-^^"^) and F{e) = i log(27r)''|E| -f 
^fj^Yj^"^^ (and k{x) = 0). Natural parameter 9 = 
— iS^^) = {v,M) consists in two parts; a vectorial 
part V, and a symmetric negative definite matrix part M ^ 0. 
The inner product of 6' = [v, M) and 9' = {v' , M') is defined 
as 



') ^v^v' + tr{M^M'), 



(108) 



where tr denote the matrix trace (i.e., sum of diagonal ele- 
ments). 

Since |S| = (|A/| = = -j^) and = 

YiV — —^M~^v, it follows that the log-normalizer expressed 
using the natural parameters is 
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^log(2.)^ + ilog(-^ 



F{v,M) 

Let us now write the term F{a9): 



(109) 

-vMXM) 



F{a6 



= F(av, aM) 



log - 



1 



1 



(111) 



We shall use the fact that \aAI\ 
matrices. It follows^ that 



2\aM\J 4 
a'^lAf I for d-dimensional 



1 - a 



{F{ad) - aF{0)) 



1 



T^U(l-")^°s2.+ ilog-^ 
1 



'2 °^ 2\M\ 



(113) 
(114) 

(115) 



d 

- log 27r 



^ ^ilog|S]|-^loga-|logp^V 



^log27r+ilog|I] 



1-a V2 °' ' 2 
d log a 



(117) 



2(1 -a) 
Appendix 

Kullback-Leibler divergence of exponential families as 
Bregman divergences 

Let us prove that for two distributions p ^ Ep{9) and q ^ 
Ep{6') belonging to the same exponential family Ep, we have 



KL(p : q) = Bp{e' 



(118) 



KL(p:g) = f pp{x;0) log 



Pp{x;9) 



dx 



(124) 



ppix;e') 

pp{x; 9) {F{e') - F{9) + {9- 9', t(.T))) dx(125) 



f pp{x;9){Bp{9' -.9) + 


(126) 


{9' - 9,WF{9)) + {9- 9',t{x)))dx 


(127) 


Bp{9':9)+ 1 pp{x;9){9' -9,VF{9)- 


Kxtm 


Bp{9':9)~ 1 pp{x;9){9' ^9,t{x))dx 

J X 


(129) 


+ {9' -9,WF{9)) 


(130) 


Bp{9' : 9) - {9' -9, f pp{x;9)t{x)dx) 

J X 


(131) 


+ {9' -9,VF{9)) 


(132) 


Bp{9' : 9) 


(133) 



since \IF{9) = E[t{X)]. 



PLACE 
PHOTO 
HERE 



Proof: We first show that VF{9) = E[t{X)] with t{X) 
the sufficient statistics; 
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F{9) = log / exp((<(x),6') +fc(.T))da; (119) 

J X 

Jj{x)exp{{t{x),9) + k{x))dx- 



VF{9) 



J^exp{{t{x),9) + k{x)}dx 



(120) 



Since e^'^-* — J^cxp{{t{x),9) + k{x))dx, we replace the 
denominator to get 



VF(6') 



t{x) cxp{{t{x), 9) - F{9) + k{x)}&^2\) 



t{x)pp{x;9)dx (122) 
= Eg[t{x)] (123) 
We are now ready to prove KL(p : q) = Bp(9' : 9): 

■^Note that the terms ~j{av)'^{aM)~^(av)+ ■^av'^ M~^v vanishes so 
that Renyi entropy does not depend on the mean parameter fi. 
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